optimizing binary size

2022-02-20 ยท 6 min read

Setup #

# for running cargo bloat
$ RUSTFLAGS="-C target-cpu=native" c install cargo-bloat

# for running cargo size (among other things)
$ rustup component add llvm-tools-preview
$ rustup +nightly component add llvm-tools-preview # (for strip=symbols)
$ RUSTFLAGS="-C target-cpu=native" c install cargo-binutils

TL;DR #

# compile and sort functions by binary size
$ cargo bloat --release -n 100

# only like 20-25% of the binary size seems to be our code or other relevant
# stuff like ndarray. The rest seems to be mostly panic and fmt
# infrastructure...

# compile and sort crates by binary size
$ cargo bloat --release --crates

# print the size of each section in the binary
$ cargo size --bin my-bin --release -- -A

# if you already have a built rust binary, you can run
# rust-size directly:
$ rust-size -A target/release/my-bin

# rust-size with nice sorted and human-readable output:
$ rust-size -A target/release/my-bin \
	| tail -n +2 \
	| sort --numeric-sort --key=2 \
	| numfmt --header=3 --to=iec-i --suffix=B --field=2

# strip debug info and all symbols (requires nightly) then print section size
$ RUSTFLAGS="-Z strip=symbols -C target-cpu=native" cargo +nightly \
	size --bin my-bin --release -- -A

# before: 1.8 MiB! looking at the sections, it's mostly debug info.
# "-Z strip=symbols" brings this down to like 330-400 KiB (depending on
# other flags etc...)

# TODO: cargo-binutils also installed `cargo strip`; maybe that's helpful?

Cargo.toml #

[profile.release]
codegen-units = 1
lto = true
panic = "abort"
# opt-level = "s" # optimize for size, but still unroll
# opt-level = "z" # optimize for size, no unrolling at all
opt-level = 3
debug = 0

Compile std with panic = "abort" #

  • shaves off maybe 150 KiB?
  • removes a decent chunk of the backtrace/unwind infrastructure
# .cargo/config.toml
[unstable]
build-std = ["std", "panic_abort"]
build-std-features = [] # <- turns off backtrace+unwind features

WASM #

https://rustwasm.github.io/twiggy/

Example: fixing bloat #

Let me just run a quick smoketest (which depends on almost every crate in the monorepo)...

$ cargo test -p smoketest
# ..
    Finished test [unoptimized + debuginfo] target(s) in 1m 28s
     Running unittests src/lib.rs (target/debug/deps/smoketest-bd637d7668a0b714)
# ..

Man that sure took a while to link, I wonder how big the binary is?

$ ls -lah target/debug/deps/smoketest-bd637d7668a0b714
-rwxrwxr-x 1 phlip9 phlip9 880M May  4 11:19 target/debug/deps/smoketest-bd637d7668a0b714

JESUS. RIP MY SSD.

$ rustup component add llvm-tools-preview
$ rust-size -A target/debug/deps/smoketest-bd637d7668a0b714 \
	| tail -n +2 \
	| sort --numeric-sort --key=2 \
	| numfmt --header=3 --to=iec-i --suffix=B --field=2
section                     size       addr
.fini_array                    8   63669912
.fini                         13   52345408
.init_array                   16   63669896
.plt.got                     24B    3608704
.init                        27B    3608576
.interp                      28B        848
.note.ABI-tag                32B        948
.note.gnu.property           32B        880
.debug_gdb_scripts           34B   55423896
.note.gnu.build-id           36B        912
.comment                     43B          0
.gnu.hash                    48B        984
.tdata                       72B   63669824
.plt                         96B    3608608
.rela.plt                   120B    3605536
.gnu.version                318B       7088
.gnu.version_r              432B       7408
.dynamic                    544B   64809064
.tbss                       696B   63669896
.bss                      2.1KiB   65535456
.dynstr                   2.2KiB       4848
.dynsym                   3.8KiB       1032
.debug_macro               12KiB          0
.data                      24KiB   65511424
.got                      686KiB   64809608
.data.rel.ro              1.1MiB   63669920
.gcc_except_table         1.4MiB   62213696
.eh_frame_hdr             1.5MiB   55423932
.debug_abbrev             2.3MiB          0
.rodata                   3.0MiB   52346880
.rela.dyn                 3.5MiB       7840
.debug_loc                3.5MiB          0
.debug_aranges            4.9MiB          0
.eh_frame                 5.1MiB   56953128
.debug_ranges              15MiB          0
.debug_line                28MiB          0
.text                      47MiB    3608768
.debug_pubnames           112MiB          0
.debug_str                175MiB          0
.debug_info               175MiB          0
.debug_pubtypes           275MiB          0
Total                     851MiB

WTF IS GOING ON WITH THE .debug_pubtypes SECTION???

Ok ok, let's take a look at what we're working with...

$ sudo apt install dwarfdump

$ dwarfdump --print-type --format-suppress-offsets target/debug/deps/smoketest-bd637d7668a0b714 \
	| head -n 10

.debug_pubtypes
 'ErrorData<alloc::boxed::Box<std::io::error::Custom, alloc::alloc::Global>>'
 'alloc::boxed::Box<std::io::error::Custom, alloc::alloc::Global>'
 'alloc::boxed::Box<(dyn core::error::Error + core::marker::Send + core::marker::Sync), alloc::alloc::Global>'
 'Result<(), std::io::error::Error>'
 'NonNull<u8>'
 'u8'
 'SimpleMessage'
 'ErrorKind'

How many types we got?

$ dwarfdump --print-type --format-suppress-offsets target/debug/deps/smoketest-bd637d7668a0b714 \
	| wc -l
	| numfmt --to=si
2.3M

Maybe there's some giga types?

$ dwarfdump --print-type --format-suppress-offsets target/debug/deps/smoketest-bd637d7668a0b714 \
	| awk '{ print length, $0 }' \
	| sort -n -r \
	> smoketest_debug_pubtypes

$ head -n 10 smoketest_debug_pubtypes
72011  '{closure_env#0}<&str, &str, n ..
71962  '&mut (nom::sequence::terminat ..
71957  '(nom::sequence::terminated::{ ..
71957  '(nom::sequence::terminated::{ ..
66740  '&mut nom::branch::alt::{closu ..
66717  '{closure_env#0}<&str, &str, n ..
66717  '{closure_env#0}<&str, &str, n ..
66668  '&mut (nom::sequence::terminat ..
66663  '(nom::sequence::terminated::{ ..
66663  '(nom::sequence::terminated::{ ..

Ok despite nom taking to top 10, it looks like the primary culprit is my arch nemesis warp. CURSE YOU WARP AND YOUR COMPOSABLE GENERICS.

Let's see what proportion of our .debug_pubtypes is warp...

$ cat smoketest_debug_pubtypes \
	| cut -d " " -f1 - \
	| awk '{sum += $1} END {print sum}' \
	| numfmt --to=iec-i --suffix=B
271MiB

$ grep "nom" smoketest_debug_pubtypes \
	| cut -d " " -f1 - \
	| awk '{sum += $1} END {print sum}' \
	| numfmt --to=iec-i --suffix=B
3.3MiB

$ grep "warp" smoketest_debug_pubtypes \
	| cut -d " " -f1 - \
	| awk '{sum += $1} END {print sum}' \
	| numfmt --to=iec-i --suffix=B
82MiB

$ grep "lightning" smoketest_debug_pubtypes \
	| cut -d " " -f1 - \
	| awk '{sum += $1} END {print sum}' \
	| numfmt --to=iec-i --suffix=B
45MiB

$ grep "proptest" smoketest_debug_pubtypes \
	| cut -d " " -f1 - \
	| awk '{sum += $1} END {print sum}' \
	| numfmt --to=iec-i --suffix=B
5MiB

$ grep "Vec" smoketest_debug_pubtypes \
	| cut -d " " -f1 - \
	| awk '{sum += $1} END {print sum}' \
	| numfmt --to=iec-i --suffix=B
71MiB

$ grep "hyper" smoketest_debug_pubtypes \
	| cut -d " " -f1 - \
	| awk '{sum += $1} END {print sum}' \
	| numfmt --to=iec-i --suffix=B
95MiB

$ grep "tokio" smoketest_debug_pubtypes \
	| cut -d " " -f1 - \
	| awk '{sum += $1} END {print sum}' \
	| numfmt --to=iec-i --suffix=B
95MiB

A bunch of duplicates...

Checking out the .debug_str section:

$ dwarfdump --print-strings --format-suppress-offsets target/debug/deps/smoketest-bafe2762ec60a400 | sort --numeric-sort --key=3 | cut -b -80 | tail -n 10
name: length 67206 is 'pin<futures_util::future::try_future::into_future::IntoFu
name: length 67220 is 'get_unchecked_mut<futures_util::future::try_future::into_
name: length 67222 is 'leak<alloc::sync::ArcInner<warp::filter::boxed::BoxingFil
name: length 67229 is 'from<futures_util::future::try_future::into_future::IntoF
name: length 67233 is 'into_pin<futures_util::future::try_future::into_future::I
name: length 67240 is 'new<alloc::boxed::Box<alloc::sync::ArcInner<warp::filter:
name: length 67257 is 'new_unchecked<alloc::boxed::Box<futures_util::future::try
name: length 71736 is 'choice<&str, &str, nom::error::Error<&str>, nom::sequence
name: length 134431 is 'into<&mut alloc::sync::ArcInner<warp::filter::boxed::Box
name: length 134508 is 'into<alloc::boxed::Box<futures_util::future::try_future: